Aiming at the problems of low hardware resource utilization and high latency of Convolutional Neural Network (CNN) when performing inference on heterogeneous platforms, a self-adaptive partitioning and scheduling method of CNN inference model was proposed. Firstly, the key operators of CNN were extracted by traversing the computational graph to complete the adaptive partition of the model, so as to enhance the flexibility of the scheduling strategy. Then, based on the performance measurement and the critical path-greedy search algorithm, according to the sub-model running characteristics on the CPU-GPU heterogeneous platform, the optimal running load was selected to improve the sub-model inference speed. Finally, the cross-device scheduling mechanism in TVM (Tensor Virtual Machine) was used to configure the dependencies and running loads of sub-models in order to achieve adaptive scheduling of model inference, and reduce the communication delay between devices. Experimental results show that on GPU and CPU, compared to the method optimized by TVM operator, the proposed method improves the inference speed by 5.88% to 19.05% and 45.45% to 311.46% with no loss of model inference accuracy.